Gap-free Bounds for Stochastic Multi-Armed Bandit

نویسندگان

  • A. Juditsky
  • A. V. Nazin
چکیده

We consider the stochastic multi-armed bandit problem with unknown horizon. We present a randomized decision strategy which is based on updating a probability distribution through a stochastic mirror descent type algorithm. We consider separately two assumptions: nonnegative losses or arbitrary losses with an exponential moment condition. We prove optimal (up to logarithmic factors) gap-free bounds on the excess risk of the average over time of the instantaneous losses induced by the choice of a specific action.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Anytime optimal algorithms in stochastic multi-armed bandits

We introduce an anytime algorithm for stochastic multi-armed bandit with optimal distribution free and distribution dependent bounds (for a specific family of parameters). The performances of this algorithm (as well as another one motivated by the conjectured optimal bound) are evaluated empirically. A similar analysis is provided with full information, to serve as a benchmark.

متن کامل

The Price of Differential Privacy for Online Learning

We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal Õ( √ T ) regret bounds. In the full-information setting, our results demonstrate that ε-differential privacy may be ensured for free – in particular, the regret bounds scale as O( √ T ) + Õ ( 1 ε ) . For bandit linear optimization, and as a special c...

متن کامل

Lower Bounds for Multi-armed Bandit with Non-equivalent Multiple Plays

We study the stochastic multi-armed bandit problem with non-equivalent multiple plays where, at each step, an agent chooses not only a set of arms, but also their order, which influences reward distribution. In several problem formulations with different assumptions, we provide lower bounds for regret with standard asymptotics O(log t) but novel coefficients and provide optimal algorithms, thus...

متن کامل

Bounded regret in stochastic multi-armed bandits

We study the stochastic multi-armed bandit problem when one knows the value μ(⋆) of an optimal arm, as a well as a positive lower bound on the smallest positive gap ∆. We propose a new randomized policy that attains a regret uniformly bounded over time in this setting. We also prove several lower bounds, which show in particular that bounded regret is not possible if one only knows ∆, and bound...

متن کامل

Lower bounds and selectivity of weak-consistent policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of stochastic multi-armed bandit. A well-known result of Lai and Robbins, which has then been extended by Burnetas and Katehakis, has established the presence of a logarithmic bound for all consistent policies. We relax the notion of consistency, and exhibit a generalisation of the bound. We also study the existence of logarith...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007